Two Autoencoder in "Iterative" training experiment results

Complete Structure

Autoencoder Structure

  • Activations
    • Hidden: elu
    • Output: tanh
  • Loss: MSE
  • Batch: 256

Classifier

  • Activations
    • Hidden: relu
    • Output: softmax
  • Loss: Categorical Crossentropy
  • Batch: 32

Mnist Data with noising

Training structure (Decision Tree)

Nosiy one level

Random renoised test DataSet of Mnist, $10k$ examples.

Nosiy multiple levels

Test DataSet of Mnist concatenate with diferent noise levels probabilities from $0\%,10\%, \cdots ,100\%$, $11 \times 10000 = 110k$ examples.

Further tested hypotheses

Different types and levels of renoising between autoencoders only while training:

  • Constant leve: All examples are renoised with the same probabilitie $\in [0\%,10\%, \cdots , 100\%]$
  • Mixed: Each examples is independently renoised with a probabilitie $\in [0\%,10\%, \cdots , 100\%]$
  • Multi: Only used for Noisy multiple levels renoise each of the $11$ groups of $10000$ examples with one of the probabilities $\in [0\%,10\%, \cdots , 100\%]$

Renoise in evaluation:

While evaluating renoise the prediction of the first autonecoder before the prediction of the second autoencoder. The results where worse.

Visual presentation of the best results

Input data: Original Mnist
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input data: Mnist with 20% Noise
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input data: Mnist with 50% Noise
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input data: Mnist with 80% Noise
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input data: Mnist with 100% Noise
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Input data: Mnist with Mixed Noise levels

Evaluation comments

For the metrics the ouput $y \in [-1,1]$ was trasformed to $\hat{y} \in [0,1]$ and then rounded to two decimal places, since we want to better observe the dieffernce betwen runs instead of the absolute value.
For the classifier the data was not converted.




absolute $\epsilon$-accuracy


diff <- |target_set - predicted_set|        // pixelwise difference stored as (10000, 784)
accu <- 0                                   // accumulator
loop elem in  diff                          // for each element in diff i.e. for each number, image (784,)
    accu <- |{i ∈ elem : elem < ε }| / 784  //count how many elements are > ε and average over pixels, i.e. divide by 784

accu <- accu/10000                          // Average over examples in image set

squared $\epsilon$-accuracy


diff <- (target_set - predicted_set)^2       // pixelwise difference stored as (10000, 784)
accu <- 0                                   // accumulator
loop elem in  diff                          // for each element in diff i.e. for each number, image (784,)
    accu <- |{i ∈ elem : elem < ε }| / 784  //count how many elements are > ε and average over pixels, i.e. divide by 784

accu <- accu/10000                          // Average over examples in image set

evaluations

$\epsilon$-accuracies for $\epsilon= 1/512$, over noise and outliers for $\epsilon= 1/4$ over noise.

$\epsilon$-accuracies for $\epsilon= 1/256$, over noise and outliers for $\epsilon= 1/2$ over noise.

$\epsilon$-accuracies for $\epsilon= 1/100$, over nois and outliers for $\epsilon= 1$ over noise.

$\epsilon$-accuracies for $\epsilon= 1/10$, over noise and outliers for $\epsilon= 1.5$ over noise.

mse: mean squared error

$\operatorname{mse} =\frac{1}{m} \sum_{j=1}^{m} \frac{1}{n} (Y_j-\hat{Y_j})^2 = \frac{1}{m} \sum_{j=1}^{m} \frac{1}{n} \sum_{i=1}^{n} (y_i-\hat{y_i})^2$
where:

  • $m$ number of examples i.e. $10000$
  • $\mathbb{Y}=\{Y_1, Y_2, \cdots, Y_m\}$ target image set, i.e. set of $m$ images each image is represented by a vector of length $n$
  • $\mathbb{\hat{Y}}=\{\hat{Y}_1, \hat{Y}_2, \cdots, \hat{Y}_m\}$ predicted image set, i.e. set of $m$ images each image is represented by a vector of length $n$
  • $n$ number of pixels i.e. $784$
  • $Y_j=\{y_{j 1},y_{j 2}, \cdots, y_{j n}\}$ is the target data of image $j$ i.e. $(784,1 )$ vector
  • $\hat{Y_j}=\{\hat{y}_{j 1},\hat{y}_{j 2}, \cdots, \hat{y}_{j n}\}$ is the prediction of image $j$ i.e. $(784, 1)$ vector

mae: mean absolute error

$\operatorname{mae} =\frac{1}{m} \sum_{j=1}^{m} \frac{1}{n} |Y_j-\hat{Y_j}| = \frac{1}{m} \sum_{j=1}^{m} \frac{1}{n} \sum_{i=1}^{n} |y_i-\hat{y_i}|$
where:

  • $m$ number of examples i.e. $10000$
  • $\mathbb{Y}=\{Y_1, Y_2, \cdots, Y_m\}$ target image set, i.e. set of $m$ images each image is represented by a vector of length $n$
  • $\mathbb{\hat{Y}}=\{\hat{Y}_1, \hat{Y}_2, \cdots, \hat{Y}_m\}$ predicted image set, i.e. set of $m$ images each image is represented by a vector of length $n$
  • $n$ number of pixels i.e. $784$
  • $Y_j=\{y_{j 1},y_{j 2}, \cdots, y_{j n}\}$ is the target data of image $j$ i.e. $(784,1 )$ vector
  • $\hat{Y_j}=\{\hat{y}_{j 1},\hat{y}_{j 2}, \cdots, \hat{y}_{j n}\}$ is the prediction of image $j$ i.e. $(784, 1)$ vector

Average maximal difference

Average over: the maximal absolute diference between pixels in each example. i.e. $\underset{i}{mean}(max(|\mathbb{Y}_i-\mathbb{\hat{Y}}_i|))$

Maximal difference

Maximum over: the maximal absolute diference between pixels in each example. i.e. $max(max(|\mathbb{Y}_i-\mathbb{\hat{Y}}_i|))$

where:

  • $m$ number of examples i.e. $10000$
  • $\mathbb{Y}=\{Y_1, Y_2, \cdots, Y_m\}$ target image set, i.e. set of $m$ images each image is represented by a vector of length $n$
  • $\mathbb{\hat{Y}}=\{\hat{Y}_1, \hat{Y}_2, \cdots, \hat{Y}_m\}$ predicted image set, i.e. set of $m$ images each image is represented by a vector of length $n$
  • $n$ number of pixels i.e. $784$
  • $Y_j=\{y_{j 1},y_{j 2}, \cdots, y_{j n}\}$ is the target data of image $j$ i.e. $(784,1 )$ vector
  • $\hat{Y_j}=\{\hat{y}_{j 1},\hat{y}_{j 2}, \cdots, \hat{y}_{j n}\}$ is the prediction of image $j$ i.e. $(784, 1)$ vector

Remarks

Training and validation data renoising

The test data that was used in validation for the training was renoised in a random mixed form while the input data for training was renoised with an equal noise lvl, with the idea of simulating real world case for the validation. Butin the iteration renoising the prediction of the validation data and the input data was equaly renoised instead of the validation set being randomly renoised.

Accuracy

Acc means equal pixels are more but unequal pixels have a higher difference.

[NbConvertApp] Converting notebook Comparisson_all.ipynb to html
[NbConvertApp] Writing 8830825 bytes to ../../Page/Iterative-autoencoder-networks-for-signal-denoising/Export/Comparisson_all.html